Pesquisa | Portal Regional da BVS

A Generative Audio-Visual Prosodic Model for Virtual Actors.

Barbulescu, Adela; Ronfard, Remi; Bailly, Gerard.

IEEE Comput Graph Appl ; 37(6): 40-51, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-29140781

RESUMO

An important problem in computer animation of virtual characters is the expression of complex mental states during conversation using the coordinated prosody of voice, rhythm, facial expressions, and head and gaze motion. In this work, the authors propose an expressive conversion method for generating natural speech and facial animation in a variety of recognizable attitudes, using neutral speech and animation as input. Their method works by automatically learning prototypical prosodic contours at the sentence level from an original dataset of dramatic attitudes.

Analysis and synthesis of the three-dimensional movements of the head, face, and hand of a speaker using cued speech.

Gibert, Guillaume; Bailly, Gérard; Beautemps, Denis; Elisei, Frederic; Brun, Rémi.

J Acoust Soc Am ; 118(2): 1144-53, 2005 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-16158668

RESUMO

In this paper we present efforts for characterizing the three dimensional (3-D) movements of the right hand and the face of a French female speaker during the audiovisual production of cued speech. The 3-D trajectories of 50 hand and 63 facial flesh points during the production of 238 utterances were analyzed. These utterances were carefully designed to cover all possible diphones of the French language. Linear and nonlinear statistical models of the articulations and the postures of the hand and the face have been developed using separate and joint corpora. Automatic recognition of hand and face postures at targets was performed to verify a posteriori that key hand movements and postures imposed by cued speech had been well realized by the subject. Recognition results were further exploited in order to study the phonetic structure of cued speech, notably the phasing relations between hand gestures and sound production. The hand and face gestural scores are studied in reference with the acoustic segmentation. A first implementation of a concatenative audiovisual text-to-cued speech synthesis system is finally described that employs this unique and extensive data on cued speech in action.

Assuntos

Comunicação não Verbal/fisiologia , Fonética , Percepção da Fala/fisiologia , Estimulação Acústica/instrumentação , Simulação por Computador , Sinais (Psicologia) , Face/fisiologia , Feminino , Mãos/fisiologia , Cabeça/fisiologia , Humanos , Imageamento Tridimensional/instrumentação , Comunicação Manual , Modelos Biológicos , Modelos Estatísticos , Movimento/fisiologia , Estimulação Luminosa/instrumentação , Espectrografia do Som

A model of acoustic interspeaker variability based on the concept of formant-cavity affiliation.

Apostol, Lian; Perrier, Pascal; Bailly, Gérard.

J Acoust Soc Am ; 115(1): 337-51, 2004 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-14759026

RESUMO

A method is proposed to model the interspeaker variability of formant patterns for oral vowels. It is assumed that this variability originates in the differences existing among speakers in the respective lengths of their front and back vocal-tract cavities. In order to characterize, from the spectral description of the acoustic speech signal, these vocal-tract differences between speakers, each formant is interpreted, according to the concept of formant-cavity affiliation, as a resonance of a specific vocal-tract cavity. Its frequency can thus be directly related to the corresponding cavity length, and a transformation model can be proposed from a speaker A to a speaker B on the basis of the frequency ratios of the formants corresponding to the same resonances. In order to minimize the number of sounds to be recorded for each speaker in order to carry out this speaker transformation, the frequency ratios are exactly computed only for the three extreme cardinal vowels [i, a, u] and they are approximated for the remaining vowels through an interpolation function. The method is evaluated through its capacity to transform the (F1,F2) formant patterns of eight oral vowels pronounced by five male speakers into the (F1,F2) patterns of the corresponding vowels generated by an articulatory model of the vocal tract. The resulting formant patterns are compared to those provided by normalization techniques published in the literature. The proposed method is found to be efficient, but a number of limitations are also observed and discussed. These limitations can be associated with the formant-cavity affiliation model itself or with a possible influence of speaker-specific vocal-tract geometry in the cross-sectional direction, which the model might not have taken into account.

Assuntos

Individualidade , Fonética , Acústica da Fala , Comportamento Verbal , Adulto , Análise de Variância , Humanos , Laringe/fisiologia , Lábio/fisiologia , Masculino , Modelos Teóricos , Boca/fisiologia , Faringe/fisiologia , Valores de Referência , Espectrografia do Som , Testes de Articulação da Fala , Comportamento Verbal/fisiologia , Gravação em Vídeo , Qualidade da Voz/fisiologia

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA